AITopics | necessary condition

Variational Regularized Unbalanced Optimal Transport: Single Network, Least Action

Neural Information Processing SystemsJun-15-2026, 05:40:24 GMT

Recovering the dynamics from a few snapshots of a high-dimensional system is a challenging task in statistical physics and machine learning, with important applications in computational biology. Many algorithms have been developed to tackle this problem, based on frameworks such as optimal transport and the Schrödinger bridge. A notable recent framework is Regularized Unbalanced Optimal Transport (RUOT), which integrates both stochastic dynamics and unnormalized distributions. However, since many existing methods do not explicitly enforce optimality conditions, their solutions often struggle to satisfy the principle of least action and meet challenges to converge in a stable and reliable way. To address these issues, we propose Variational RUOT (Var-RUOT), a new framework to solve the RUOT problem. By incorporating the optimal necessary conditions for the RUOT problem into both the parameterization of the search space and the loss function design, Var-RUOT only needs to learn a scalar field to solve the RUOT problem and can search for solutions with lower action. We also examined the challenge of selecting a growth penalty function in the widely used Wasserstein-Fisher-Rao metric and proposed a solution that better aligns with biological priors in Var-RUOT.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Assessing model calibration with boosting trees

Gatti, Selim

arXiv.org Machine LearningJun-9-2026

In regression modelling, the primary objective is to approximate the true conditional mean of a response given a set of features. To this end, various statistical models are used to fit a regression function that provides a mean estimate for each single set of features. This function is said to be calibrated if the resulting mean estimates match the true conditional means for almost all features. Aiming for calibration seems not achievable in practice as models are fitted on finite samples of noisy observations. A weaker notion of calibration is auto-calibration (sometimes also called mean-calibration or well-calibration); see, for example, Kr uger-Ziegel [22] and Denuit et al. [7]. This notion goes back to earlier works on the reliability of probabilistic forecasts in meteorology; we refer to Bross [2], Sanders [26] and Murphy-Winkler [23]. It means that when responses are grouped according to their mean estimates, the average of the responses within each group matches this estimate. This property is important in various applications where sums of mean estimates have to match sums of responses at a global and local level. This is, for example, the case in insurance pricing as an auto-calibrated pricing system avoids systematic cross-subsidy between different price cohorts; we refer the reader to Pohle [24], Denuit et al. [6], Fissler et al. [9] and W uthrich-Merz [30].

artificial intelligence, calibration, machine learning, (17 more...)

arXiv.org Machine Learning

2606.08084

Country: Europe > Austria (0.28)

Genre: Research Report (0.65)

Industry: Banking & Finance > Insurance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.59)

Add feedback

e3fea99df80195b316cefa7aa6099cd5-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 02:28:15 GMT

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)

Add feedback

Deep Learning without Poor Local Minima

Kenji Kawaguchi

Neural Information Processing SystemsApr-22-2026, 13:02:14 GMT

In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. With no unrealistic assumption, we first prove the following statements for the squared loss function of deep linear neural networks with any depth and any widths: 1) the function is non-convex and non-concave, 2) every local minimum is a global minimum, 3) every critical point that is not a global minimum is a saddle point, and 4) there exist "bad" saddle points (where the Hessian has no negative eigenvalue) for the deeper networks (with more than three layers), whereas there is no bad saddle point for the shallow networks (with three layers). Moreover, for deep nonlinear neural networks, we prove the same four statements via a reduction to a deep linear model under the independence assumption adopted from recent work. As a result, we present an instance, for which we can answer the following question: how difficult is it to directly train a deep model in theory? It is more difficult than the classical machine learning models (because of the non-convexity), but not too difficult (because of the nonexistence of poor local minima). Furthermore, the mathematically proven existence of bad saddle points for deeper models would suggest a possible open problem. We note that even though we have advanced the theoretical foundations of deep learning and non-convex optimization, there is still a gap between theory and practice.

artificial intelligence, machine learning, theorem 2, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

Neural Information Processing SystemsMar-20-2026, 10:14:35 GMT

In this work, we study the mean-field flow for learning subspace-sparse polynomials using stochastic gradient descent and two-layer neural networks, where the input distribution is standard Gaussian and the output only depends on the projection of the input onto a low-dimensional subspace. We establish a necessary condition for SGD-learnability, involving both the characteristics of the target function and the expressiveness of the activation function. In addition, we prove that the condition is almost sufficient, in the sense that a condition slightly stronger than the necessary condition can guarantee the exponential decay of the loss functional to zero.

artificial intelligence, machine learning, proceedings, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

e3fea99df80195b316cefa7aa6099cd5-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 15:44:02 GMT

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)

Add feedback

c981fd12b1d5703f19bd8289da9fc996-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 02:19:48 GMT

artificial intelligence, machine learning, optimization, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Maryland > Baltimore (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

A Generalized Alternating Method for Bilevel

Neural Information Processing SystemsFeb-17-2026, 02:19:44 GMT

Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can match the convergence rate of single-level gradient descent (GD) when addressing bilevel problems with a strongly convex lower-level objective. However, it remains unclear whether this result can be generalized to bilevel problems beyond this basic setting.

artificial intelligence, machine learning, optimization, (16 more...)

Neural Information Processing Systems

Country: